Privacy-preserving large language models for structured medical information retrieval

Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) “Llama 2” to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.


Supplementary Table 2 -Consensus for ground truth definition
Medical documentation is often ambiguous.Information does not always correspond to the same level of precision, so agreement among raters was necessary to ensure a consistent definition of ground truth.
Free-text medical documentation is often fuzzy, so mutually exclusive and commonly exhaustive categories cannot always be easily defined.Supplementary Figure 1 -Example of biases of Large language models.The figure displays a modified medical history report from MIMIC IV on the left side, in which the patient's gender is subtly indicated only by the abbreviation "f" and the personal pronoun "she," both highlighted in a red box.On the right side, when Llama 2 is prompted about the patient's gender, it fails to recognize these subtle indications.Instead, it infers the gender as female based on the likelihood of certain conditions mentioned in the report, rather than the explicit gender markers.
Supplementary Figure 2 -Llama-2 Prompt engineering: Integration of System and User Prompts with 7b Model Evaluation.This illustrates the Llama-2 prompt engineering process, highlighting two distinct modules: the system prompt and the user prompt.The system prompt is designed to guide the behavior of the Language Model, setting the overall context and parameters for interaction.Following this, the user prompt provides detailed, specific instructions and questions, tailoring the model's response to particular tasks or queries.In the first round of prompt engineering with the 7b model, both the system and user prompt were employed.(1) We included a report and the respective questions about the features in the user prompt and then compared the accuracy of this prompting technique with the following approaches, which include more modules: (2) Adapting a chain-of-thought (CoT) approach, where the model was prompted to provide the corresponding text excerpt before answering the questions and grammar-forced output of this excerpt did not enhance the accuracy.
(3) Giving an example deteriorated the accuracy for all features except for shortness of breath.(4) Providing an additional definition about the features improved the accuracy for the features shortness of breath and abdominal pain.Npv= Negative predictive value, Ppv= Positive predictive value.
Supplementary Figure 3 -Performance with all Modules in System and User Prompt.The first row depicts the prompt engineering approach where all prompt modules were part of the general "system" prompt.In the second row, the prompt modules "report" and "question" were shifted to the "user" prompt.
For both strategies, the 70 billion parameter (70B) model performed better.The different prompt module locations, however, had no substantial influence on models' performance.Npv= Negative predictive value, Ppv= Positive predictive value.

MAIN ZERO SHOT PROMPT
[INST] <<SYS>> You are programmed as a cooperative medical assistant.A patient report will be available to you, and users will request specific information from this report.Your responses should adhere rigorously to the information contained within the provided report, ensuring no fabrication or assumption of details not explicitly stated This is the report:

{}
Now answer following questions: From the report, is ascites present at or before patient admission?
From the report, is abdominal pain present at or before admission?
From the report, is shortness of breath present at or before admission?
From the report, is confusion present at or before admission?
From the report, is liver cirrhosis present or suspected at admission?
These are the definitions: Abdominal pain refers to any discomfort or pain that occurs in the abdominal area.It may sometimes be abbreviated as "abd pain" in medical contexts.The pain can also be specifically located and described by its region: Epigastric: Near the upper-middle region of the abdomen.RUQ: Right Upper Quadrant.RLQ: Right Lower Quadrant.LUQ: Left Upper Quadrant.LLQ: Left Lower Quadrant.If a 10-point review of systems (ROS) does not indicate any issues (is described as negative) and "abdominal pain" or its abbreviation are not explicitly mentioned in a medical report, it indicates that the patient does not have abdominal pain per the context provided in the definition.
Shortness of breath (also known as SOB or dyspnea) refers to difficulty breathing.If it occurs during physical activity, it's referred to as dyspnea on exertion (DOE).If a 10 point review of systems (ROS) is negative (i.e., does not indicate any abnormality or issue) and the terms "dyspnea," "SOB," or "DOE" are not otherwise mentioned in a medical report, this is taken to mean the patient is not experiencing shortness of breath according to the context given.
Confusion is a mental state characterized by disorientation and an inability to think clearly, often manifesting as difficulty remembering, making decisions, and maintaining awareness of critical aspects such as time, place, and personal identity.In medical contexts, the concept of One-Shot Prompt You are programmed as a cooperative medical assistant.A patient report will be available to you, and users will request specific information from this report.Your responses should adhere rigorously to the information contained within the provided report, ensuring no fabrication or assumption of details not explicitly stated.You will be given an example, then proceed with the report provided.

<</SYS>> [/INST]
This is the report: ___ HCV cirrhosis c/b ascites, hiv on ART, h/o IVDU, COPD, \nbioplar, PTSD, presented from OSH ED with worsening abd \ndistension over past week.\nPt reports self-discontinuing lasix and spirnolactone ___ weeks \nago, because she feels like \"they don't do anything\" and that \nshe \"doesn't want to put more chemicals in her.\"She does not \nfollow Na-restricted diets.In the past week, she notes that she \nhas been having worsening abd distension and discomfort.She \ndenies ___ edema, or SOB, or orthopnea.She denies f/c/n/v, d/c, \ndysuria.She had food poisoning a week ago from eating stale \ncake (n/v 20 min after food ingestion), which resolved the same \nday.She denies other recent illness or sick contacts.She notes \nthat she has been noticing gum bleeding while brushing her teeth \nin recent weeks.she denies easy bruising, melena, BRBPR, \nhemetesis, hemoptysis, or hematuria.\nBecause of her abd pain, she went to OSH ED and was transferred \nto ___ for further care.Per ED report, pt has brief period of \nconfusion -she did not recall the ultrasound or bloodwork at \nosh.She denies recent drug use or alcohol use.She denies \nfeeling confused, but reports that she is forgetful at times.\nIn the ED, initial vitals were 98.4 70 106/63 16 97%RA \nLabs notable for ALT/AST/AP ___ ___: ___, \nTbili1.6,WBC 5K, platelet 77, INR 1.6 Now answer following questions: From the report, is ascites present at or before patient admission?
From the report, is abdominal pain present at or before admission?
From the report, is shortness of breath present at or before admission?
From the report, is confusion present at or before admission?
From the report, is liver cirrhosis present or suspected at admission?

[INST]
This is the report:

{{REPORT}}
Now answer following questions: From the report, is ascites present at or before patient admission?
From the report, is abdominal pain present at or before admission?
From the report, is shortness of breath present at or before admission?
From the report, is confusion present at or before admission?
From the report, is liver cirrhosis present or suspected at admission?
You are programmed as a cooperative medical assistant.A patient report will be available to you, and users will request specific information from this report.Your responses should adhere rigorously to the information contained within the provided report, ensuring no fabrication or assumption of details not explicitly stated.
Provide an excerpt from text first, then answer the questions.

<</SYS>>
Please extract the following information from the text: Is ascites present at admission?Provide an excerpt from text, then answer the question.
Is abdominal pain present at or before admission?Provide an excerpt from the text, then answer the question.
Is shortness of breath present at or before admission?Provide an excerpt from the text, then answer the question.
Is confusion present at or before admission?Provide an excerpt from the text, then answer the question.
Is liver cirrhosis present or suspected at admission?Provide an excerpt from the text, then answer the question.

[/INST]
Definition Prompt [INST] <<SYS>> You are programmed as a cooperative medical assistant.A patient report will be available to you, and users will request specific information from this report.Your responses should adhere rigorously to the information contained within the provided report, ensuring no fabrication or assumption of details not explicitly stated.

{}
Now answer following questions: From the report, is ascites present at or before patient admission?
From the report, is abdominal pain present at or before admission?
From the report, is shortness of breath present at or before admission?
From the report, is confusion present at or before admission?
From the report, is liver cirrhosis present or suspected at admission?
These are the definitions: Ascites refers to the accumulation of excess fluid in the peritoneal cavity, which is the space between the organs and the abdominal wall, often resulting from liver disease, heart failure, or cancer.
Abdominal pain refers to any discomfort or pain that occurs in the abdominal area.It may sometimes be abbreviated as "abd pain" in medical contexts.The pain can also be specifically located and described by its region: Epigastric: Near the upper-middle region of the abdomen.RUQ: Right Upper Quadrant.RLQ: Right Lower Quadrant.LUQ: Left Upper Quadrant.LLQ: Left Lower Quadrant.If a 10-point review of systems (ROS) does not indicate any issues (is described as negative) and "abdominal pain" or its abbreviation are not explicitly mentioned in a medical report, it indicates that the patient does not have abdominal pain per the context provided in the definition.
Shortness of breath: Shortness of breath (also known as SOB or dyspnea) refers to difficulty breathing.If it occurs during physical activity, it's referred to as dyspnea on exertion (DOE).If a 10 point review of systems (ROS) is negative (i.e., does not indicate any abnormality or issue) and the terms "dyspnea," "SOB," or "DOE" are not otherwise mentioned in a medical report, this is taken to mean the patient is not experiencing shortness of breath according to the context given.
Confusion is a mental state characterized by disorientation and an inability to think clearly, often manifesting as difficulty remembering, making decisions, and maintaining awareness of critical aspects such as time, place, and personal identity.In medical contexts, the concept of orientation is pivotal.'Oriented x4' indicates that an individual is lucid and aware of four key domains: person (awareness of oneself), place (recognition of physical location), time (understanding of the day, date, and/or time), and situation (comprehension of the ongoing events or circumstances).Consequently, being 'Oriented x4' signifies the absence of confusion.
Conversely, if orientation is noted as less than 4, e.g., 'oriented x3', confusion is presumed present.Furthermore, impaired vigilance, exemplified when a patient is only intermittently responsive, is also indicative of confusion.Practical examples from medical reports might include phrases such as 'pt has brief period of confusion' or 'alert-oriented x3', suggesting episodes or states of confusion within the patient's condition.
Liver cirrhosis: Is a late stage of scarring (fibrosis) of the liver caused by many forms of liver diseases and conditions, such as hepatitis and chronic alcoholism, leading to loss of liver function and potential complications like bleeding, jaundice, and hepatic encephalopathy.Examples: HCV cirrhosis, decompensated alcoholic and Hepatitis C cirrhosis, ETO cirrhosis. [/INST]""" Oriented x4' signifies the absence of confusion.Conversely, if orientation is noted as less than 4, e.g., 'oriented x3', confusion is presumed present.Furthermore, impaired vigilance, exemplified when a patient is only intermittently responsive, is also indicative of confusion.Practical examples from medical reports might include phrases such as 'pt has brief period of confusion' or 'alertoriented x3', suggesting episodes or states of confusion within the patient's condition.
orientation is pivotal.'Oriented x4' indicates that an individual is lucid and aware of four key domains: person (awareness of oneself), place (recognition of physical location), time (understanding of the day, date, and/or time), and situation (comprehension of the ongoing events or circumstances).Consequently, being '